Blind Location of Phonetic Boundaries

نویسندگان

R. Cmejla

P. Sovka

چکیده

This contribution addresses the location of phonetic boundaries (LPB) for Czech phonetic categories. A novel method based on discriminant function and Bayesian change-point detectors (BCD) is suggested and tested for synthetic and real speech; the consistency and strength of the method was confirmed by experiment. The LPB process for finding significant boundaries consists of four steps: pitch-synchronous segmentation, signal parameterization using Bayesian evidence with polynomial and autoregressive models, discriminant function evaluation and BCD application. The proper boundary location is in average greater than 75% for continuous speech. The error in the time-location of boundaries less than 6 ms can be achieved for affricates/vowels, burst/vowel and silence/most of the phonetic categories. INTRODUCTION The detection, estimation and location of speech discontinuities (changepoints) has been intensively studied for several decades. Many methods for speech segmentation based on various characteristics have been developed. The most widely used segmentation principles are the likelihood a nalysis [1], the hidden Markov models (HMM) [2], the Bayesian approach with a HMM method [3], the combination of the Bayesian approach with rules [4], and discrimination analysis [5]. This contribution deals with the possibility of using the combination of discrimination analysis with Bayesian evidence (BE) [6], and Bayesian changepoint detectors (BCD) [7]. The main motivation for this work was text-to-speech inventory acquisition. This approach requires the training of discriminant functions [9] for chosen speech classes, but it is not as extensive as the model training, which is required if HMM or neural nets are used. The LPB process consists of four steps: modified pitch-synchronous segmentation [11], followed by a signal parameterization using BE. Then a suitable discriminant function used for the segment concatenation is applied. Finally two types of BCDs are used to locate the final boundaries. Signal parameters are estimated by an algorithm of Bayesian model order selection and show us to what degree of accuracy it is possible to describe one pitch period using polynomial [8] or autoregressive models [6], [7], [8]. The parameter vector v of one pitch period segment is then given by BEs)] 8 () 1 () 4 () 0 ([ AR AR PM PM = v. The discriminant function [9] is determined for each vector v in (1). Two possible classes must be used for segment concatenation. The decision strategy is to associate the current f rame with the past frame if there is the same class in both of the neighboring frames. No …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustic-phonetic Cues to Word Boundary Location: Evidence from Word Spotting

This research examined acoustic-phonetic cues to word boundary location in French consonant clusters, and assessed their use in on-line lexical segmentation. Two word-spotting experiments manipulated the alignment between word targets and syllable boundaries. A perceptual cost of such misalignment was observed for obstruent-liquid clusters but not for /s/ + obstruent clusters. For the former cl...

متن کامل

"blind" Speech Segmentation: Automatic Segmentation of Speech without Linguistic Knowledge

A new automatic speech segmentation procedure, called the \Blind" speech segmentation, is presented. This procedure allows a speech sample to be segmented into sub-word units without the knowledge of any linguistic information (such as, orthographic or phonetic transcription). Hence, this procedure involves nding the optimal number of sub-word segments in the given speech sample, before locatin...

متن کامل

Production of English Lexical Stress by Persian EFL Learners

This study examines the phonetic properties of lexical stress in English produced by Persian speakers learning English as a foreign language. The four most reliable phonetic correlates of English lexical stress, namely fundamental frequency, duration, intensity, and vowel quality were measured across Persian speakers’ production of the stressed and unstressed syllables of five English disyllabi...

متن کامل

Integrating phonetic boundary discrimina

In this study, we investigate methods of (a) detecting phonetic boundaries directly from acoustics, and (b) integrating these into HMM-based speech recognition. We test the hypothesis that detecting phone boundaries may be easier using phonological features rather than phonetic or direct acoustic information. We also show how HMMs can be more attuned to the transition of phone boundaries by exp...

متن کامل

Phonetic segmentation of singing voice using MIDI and parallel speech

When analyzing singing voice signal, it is required to know the boundaries of each phonetic unit in the singing voice samples. However, due to prolonged vowels in the singing voice, it is not easy to accurately align a singing voice with the phonetic sequence of its lyrics by conventional speech recognition approach. This paper proposes a solution for the phonetic annotation of the singing voic...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Blind Location of Phonetic Boundaries

نویسندگان

چکیده

منابع مشابه

Acoustic-phonetic Cues to Word Boundary Location: Evidence from Word Spotting

"blind" Speech Segmentation: Automatic Segmentation of Speech without Linguistic Knowledge

Production of English Lexical Stress by Persian EFL Learners

Integrating phonetic boundary discrimina

Phonetic segmentation of singing voice using MIDI and parallel speech

عنوان ژورنال:

اشتراک گذاری